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1 A language for describing predictors and its application to automatic synthesis 
Joel Emer, Nikolas Gloy 

May 1997 ACM SIGARCH Computer Architecture News , Proceedings of the 24th 

annual international symposium on Computer architecture ISCA '97, volume 
25 Issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: ^pdf(1.51 MB) 



As processor architectures have increased their reliance on speculative execution to 
improve performance, the importance of accurate prediction of what to execute 
speculatively has increased. Furthermore, the types of values predicted have expanded 
from the ubiquitous branch and call/return targets to the prediction of indirect jump 
targets, cache ways and data values. In general, the prediction process is one of 
identifying the current state of the system, and making a prediction for some as ye ... 

Genetic programming: Meta-arammar constant creation with grammatical evolution Q 

by grammatical evolution 

Ian Dempsey, Michael O'Neill, Anthony Brabazon 

June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 

computation GECCO '05 
Publisher: ACM Press 

Full text available: ^| pdf(205.51 KB) Additional Information: full citation , abstract, references , index terms 

This study examines the utility of meta-grammar constant generation on a series of 
benchmark problems. The performance of the meta-grammar approach is compared to a 
grammar which incorporates grammatical ephemeral random constants, digit 
concatenation, and an expression based approach. It is found that the meta-grammar 
approach to constant creation is particularly beneficial on the dynamic problem instances 
in terms of the best fitness values achieved. 

Keywords: constant creation, digit concatenation, ephemeral random constants, genetic 
programming, grammatical evolution, meta-grammars 



Meta optimization: improving compiler heuristics with machine learning 

Mark Stephenson, Saman Amarasinghe, Martin Martin, Una-May O'Reilly 

May 2003 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2003 conference 

on Programming language design and implementation PLDI '03, volume 38 

Issue 5 
Publisher: ACM Press 

Full text available: fj) pdf(302.23 Km M ^ona\ Information: full citation , abstract, references , citings, index 

terms 
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Compiler writers have crafted many heuristics over the years to approximately solve NP- 
hard problems efficiently. Finding a heuristic that performs well on a broad range of 
applications is a tedious and difficult process. This paper introduces Meta Optimization, a 
methodology for automatically fine-tuning compiler heuristics. Meta Optimization uses 
machine-learning techniques to automatically search the space of compiler heuristics. Our 
techniques reduce compiler design complexity by relieving c ... 

Keywords: compiler heuristics, genetic programming, machine learning, priority functions 



4 Distributing collective adaptation via message passing Q 
£l Thomas Haynes 

February 1999 Proceedings of the 1999 ACM symposium on Applied computing 
Publisher: ACM Press 

Full text available: || |pdf(546.12 KB) Additional Information: full citation , references , citings , index terms 



Keywords: collective adaptation, distributed PC clusters, genetic programmed, linux, 
message passing interface 



5 A novel ensemble-based scoring and search algorithm for protein redesign, and its Q 
application to modify the substrate specificity of the gramicidin synthetase A 

^ phenylalanine adenylation enzyme 

Ryan H. Lilien, Brian W. Stevens, Amy C. Anderson, Bruce R. Donald 

March 2004 Proceedings of the eighth annual international conference on Resaerch 

in computational molecular biology RECOMB '04 
Publisher: ACM Press 

Full text available: ^ pdf(2.36 MB) Additional Information: full citation , abstract , references , index terms 

Realization of novel molecular function requires the ability to alter molecular complex 
formation. Enzymatic function can be altered by changing enzyme-substrate interactions 
via modification of an enzyme's active site. A redesigned enzyme may either perform a 
novel reaction on its native substrates or its native reaction on novel substrates. A number 
of computational approaches have been developed to address the combinatorial nature of 
the protein redesign problem. These approaches typically se ... 

Keywords: enzyme design, fluorescence binding assay, molecular ensemble, non- 
ribosomal peptide synthetase, protein design, protein flexibility, protein-ligand binding 



6 Optimizing Sorting with Genetic Algorithms Q 
Xiaoming Li, Maria Jesus Garzaran, David Padua 

March 2005 Proceedings of the international symposium on Code generation and 
optimization CGO '05 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(275.30 KB) Additional Information: full citation , abstract 

The growing complexity of modern processors has made the generation of highly efficient 
code increasingly difficult. Manual code generation is very time consuming, but it is often 
the only choice since the code generated by today's compiler technology often has much 
lower performance than the best hand-tuned codes. A promising code generation strategy, 
implemented by systems like ATLAS, FFTW, and SPIRAL, uses empirical search to find the 
parameter values of the implementation, such as the tile s ... 

7 Computational models: BLOB computing Q 
Frederic Gruau, Yves Lhuillier, Philippe Reitz, Olivier Temam 
April 2004 Proceedings of the 1st conference on Computing frontiers 
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Publisher: ACM Press 

Full text available: ^pdf(1.02 MB) Additional Information: full citation , abstract , references , index terms 

Current processor and multiprocessor architectures are almost all based on the Von 
Neumann paradigm. Based on this paradigm, one can build a general-purpose computer 
using very few transistors, e.g., 2250 transistors in the first Intel 4004 microprocessor. In 
other terms, the notion that on-chip space is a scarce resource is at the root of this 
paradigm which trades on-chip space for program execution time. Today, technology 
considerably relaxed this space constraint. Still, few research works q ... 

Keywords: bio-inspiration, cellular automata, scalable architectures 



8 Parallel program performance prediction using deterministic task graph analysis 
Vikram S. Adve, Mary K. Vernon 

February 2004 ACM Transactions on Computer Systems (TOCS), Volume 22 issue l 
Publisher: ACM Press 

r- >■ ^ -i ui 0i .i/ctc on i/dn Additional Information: full citation , abstract , references , index terms . 

Full text available: W\ pdf(576.29 KB) — : 

lfi,r review 

In this article, we consider analytical techniques for predicting detailed performance 
characteristics of a single shared memory parallel program for a particular input. Analytical 
models for parallel programs have been successful at providing simple qualitative insights 
and bounds on program scalability, but have been less successful in practice for providing 
detailed insights and metrics for program performance (leaving these to measurement or 
simulation). We develop a conceptually simple mode ... 

Keywords: Analytical model, deterministic model, parallel program performance 
prediction, queueing network, shared memory, task graph, task scheduling 



Final shift for call/cc:: direct implementation of shift and reset Q 
Martin Gasbichler, Michael Sperber 

September 2002 ACM SIGPLAN Notices , Proceedings of the seventh ACM SIGPLAN 

international conference on Functional programming ICFP '02, Volume 
37 Issue 9 

Publisher: ACM Press 

Full text available: P| P df(271.99KB) AdditionaI lnformation: Mdtatjon, abstract, references, citings, index 

terms 

We present a direct implementation of the shift and reset control operators in the SFE 
system. The new implementation improves upon the traditional technique of simulating 
shift and reset via callcc. Typical applications of these operators exhibit space savings and 
a significant overall performance gain. Our technique is based upon the popular 
incremental stack/heap strategy for representing continuations. We present 
implementation details as well as som ... 

Keywords: continuations, implementation, scheme 



10 Automated design of finite state machine predictors for customized processors 
Timothy Sherwood, Brad Calder 

May 2001 ACM SIGARCH Computer Architecture News , Proceedings of the 28th 

annual international symposium on Computer architecture ISCA '01, volume 

29 Issue 2 
Publisher: ACM Press 

i» •■ ^ i ui a ^x/n,, a o ixox Additional Information: full citation , abstract , references , citings , index 

Full text available: Tm pdf(914.12 KB) ^ 

terms 

Customized processors use compiler analysis and design automation techniques to take a 
generalized architectural model and create a specific instance of it which is optimized to a 
given application or set of applications. These processors offer the promise of satisfying 
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the high performance needs of the embedded community while simultaneously shrinking 
design times. 

Finite State Machines (FSM) are a fundamental building block in computer architecture, 
and are used to control ... 

11 Multi Relational Data Mining (MRDM): Scalability and efficiency in multi-relational I I 
^ data mining 

^ Hendrik Blockeel, Michele Sebag 

July 2003 ACM SIGKDD Explorations Newsletter, Volume 5 issue i 

Publisher: ACM Press 

Full text available: ^pdf(1.61 MB) Additional Information: full citation , abstract , references , citings 

Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. 
The issue has been receiving an increasing amount of attention during the last few years, 
and quite a number of theoretical results, algorithms and implementations have been 
presented that explicitly aim at improving the efficiency and Scalability of multi-relational 
data mining approaches. With this article we attempt ... 



12 Distributed collective adaptation applied to a hard combinatorial optimization problem Q 
Thomas Haynes 

February 1999 Proceedings of the 1999 ACM symposium on Applied computing 
Publisher: ACM Press 

Full text available: S pdf(562.92 KB) Additional Information: full citation , references , citings , index terms 



Keywords: collective adaptation, distributed search 



13 GWS contributions: Constant generation for the financial domain using grammatical I I 
evolution 
Ian Dempsey 

June 2005 Proceedings of the 2005 workshops on Genetic and evolutionary 

computation GECCO '05 
Publisher: ACM Press 

Full text available: ^ pdf(95.17 KB) Additional Information: full citation , abstract , references , index terms 

This study reports the work to date on the analysis of different methodologies for constant 
creation with the aim of applying the most advantageous method to the dynamic real 
world problem of a live trading system. The methodologies explored here are Digit 
Concatenation and Grammatical Ephemeral Random Constants with clear advantages 
identified for a digit concatenation approach in combination with the ability to forrmnew 
constants through their recombination within expressions. 

Keywords: constant creation, digit concatenation, genetic programming, grammatical 
evolution 




14 Automatic composition of music by means of grammatical evolution 

Alfonso Ortega de la Puente, Rafael Sanchez Alfonso, Manuel Alfonseca Moreno 
V June 2002 ACM SIGAPL APL Quote Quad , Proceedings of the 2002 conference on 
APL: array processing languages: lore, problems, and applications APL 
'02, Volume 32 Issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(191.13 KB) Additional Information: full citation , abstract , references 

This work describes how grammatical evolution may be applied to the domain of automatic 
composition. Our goal is to test this technique as an alternate tool for automatic 
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composition. The AP440 auxiliary processor will be used to play music, thus we shall use a 
grammar that generates AP440 melodies. Grammar evolution will use fitness functions 
defined from several well-known single melodies to automatically generate AP440 
compositions that are expected to sound like those composed by human music ... 

15 BioGEC contributions: The evolutionary computation approach to motif discovery in I I 

biological sequences 
^ Michael A. Lones, Andy M. Tyrrell 

June 2005 Proceedings of the 2005 workshops on Genetic and evolutionary 

computation GECCO '05 
Publisher: ACM Press 

Full text available: |p pdf(348.60 KB) Additional Information: full citation , abstract , references , index terms 

Finding motifs — patterns of conserved residues — within nucleotide and protein 
sequences is a key part of understanding function and regulation within biological systems. 
This paper presents a review of current approaches to motif discovery, both evolutionary 
computation based and otherwise, and a speculative look at the advantages of the 
evolutionary computation approach and where it might lead us in the future. Particular 
attention is given to the problem of characterising regulat ... 

Keywords: biological sequence understanding, evolutionary computation, motif discovery 



16 Writing the web: Mining topic-specific concepts and definitions on the web Q 
j& Bing Liu, Chee Wee Chin, Hwee Tou Ng 

yP- May 2003 Proceedings of the 12th international conference on World Wide Web 
Publisher: ACM Press 

r- i. ♦ ^ i ui 0 At^Ati cc t^Q\ Additional Information: full citation , abstract , references, citings , index 
Full text available: H] pdf(245.66 KB) terms — 

Traditionally, when one wants to learn about a particular topic, one reads a book or a 
survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a 
topic from the Web is becoming increasingly important and popular. This is also due to the 
Web's convenience and its richness of information. In many cases, learning from the Web 
may even be essential because in our fast changing world, emerging topics appear 
constantly and rapidly. There is often not enough time for someone ... 

Keywords: definition mining, domain concept mining, information integration, knowledge 
compilation, web content mining 



17 Power modeling and optimization for embedded systems: Automated 

^ energy/performance macromodelinq of embedded software 
Anish Muttreja, Anand Raghunathan, Srivaths Ravi, Niraj K. Jha 
June 2004 Proceedings of the 41st annual conference on Design automation 

Publisher: ACM Press 

Additional Information: full citation , abstract, references , index terms . 



Full text available: 

review 

Efficient energy and performance estimation of embedded software is a critical part of any 
system-level design flow. Macromodeling based estimation is an attempt to speed up 
estimation by exploiting reuse that is inherent in the design process. Macromodeling 
involves pre-characterizing reusable software components to construct high-level models, 
which express the execution time or energy consumption of a sub-program as a function 
of suitable parameters. During simulation, macromodels can be used ... 

Keywords: data serialization, embedded software, genetic programming, macromodeling, 
regression, symbolic 
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18 Bibliography of recent publications on computer communication Q 
Martha Steenstrup 

January 1998 ACM SIGCOMM Computer Communication Review, Volume 28 issue l 
Publisher: ACM Press 

Full text available: 1 53 pdf(2.02 MB) Additional Information: full citation , abstract , index terms 



The quantitative results presented in our SIGCOMM '97 paper [1] include numerous minor 
errors. These errors were caused by programming bugs that led to faulty analyses and 
simulations, and by inaccurate transcriptions during the preparation of the paper. Here we 
present corrected figures and tables, as well as corrections to values that appeared in the 
text of the original paper. The effect of correcting the errors is to reduce the differences 
between the results based on the proxy trace and tho ... 

19 Automatically structured and translated queries: The effectiveness of automatically I I 
<^ structured queries in digital libraries 

^ Marcos Andre Gongalves, Edward A. Fox, Aaron Krowne, Pavel Calado, Alberto H. F. Laender, 
Altigran S. da Silva, Berthier Ribeiro-Neto 

June 2004 Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries 
Publisher: ACM Press 

Full text available: ^ pdf(295.40 KB) Additional Information: full citation , abstract , references , index terms 

Structured or fielded metadata is the basis for many digital library services, including 
searching and browsing. Yet, little is known about the impact of using structure on the 
effectiveness of such services. In this paper, we investigate a key research question: do 
structured queries improve effectiveness in DL searching? To answer this question, we 
empirically compared the use of unstructured queries to the use of structured queries. We 
then tested the capability of a simple Bayesian network s ... 

Keywords: bayesian networks, digital libraries, structured queries 



20 The dynamic servers problem Q 
Moses Charikar, Dan Halperin, Rajeev Motwani 

January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms 

Publisher: Society for Industrial and Applied Mathematics 

Full text available: pdfd.26 MB) Additional Information: full citation , references , citings , index terms 
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21 Compilers: Adaptive java optimisation using instance-based learning 
Shun Long, Michael O'Boyle 

June 2004 Proceedings of the 18th annual international conference on 
Supercomputing 

Publisher: ACM Press 

Full text available: | ppdf(231.16 KB) Additional Information: full citation , abstract , references , index terms 

This paper describes a portable, machine learning-based approach to Java optimisation. 
This approach uses an instance-based learning scheme to select good transformations 
drawn from Pugh 's Unified Transformation Framework [11], This approach was 
implemented and applied to a number of numerical Java benchmarks on two platforms. 
Using this scheme, we are able to gain over 70% of the performance improvement found 
when using an exhaustive iterative search of the best compiler optimisations. Thus we ... 



Keywords: adaptive optimisation, instance-based learning, java, optimisation space 



22 Predicting Unroll Factors Using Supervised Classification Q 
Mark Stephenson, Saman Amarasinghe 

March 2005 Proceedings of the international symposium on Code generation and 

optimization CGO '05 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(214.87 KB) Additional Information: full citation , abstract 

Compilers base many critical decisions on abstracted architectural models. While recent 
research has shown that modeling is effective for some compiler problems, building 
accurate models requires a great deal of human time and effort. This paper describes how 
machine learning techniques can be leveraged to help compiler writers model complex 
systems. Because learning techniques can effectively make sense of high dimensional 
spaces, they can be a valuable tool for clarifying and discerning comple ... 

23 Automatic Tuning of Inlininq Heuristics Q 
John Cavazos, Michael F. P. O'Boyle 

November 2005 Proceedings of the 2005 ACM/IEEE conference on Supercomputing SC 
•05 

Publisher: IEEE Computer Society 
pdf(328.72 KB) 

Additional Information: full citation , abstract 

Publisher Site 

Inlining improves the performance of programs by reducing the overhead of method 
invocation and increasing the opportunities for compiler optimization. Incorrect inlining 
decisions, however, can degrade both the running and compilation time of a program. This 



Full text available: 1 
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is especially important for a dynamically compiled language such as Java. Therefore, the 
heuristics that control inlining must be carefully tuned to achieve a good balance between 
these two costs to reduce overall total execution time. This ... 

24 The use of dynamic contexts to improve casual internet searching Q 
Gondy Leroy, Ann M. Lally, Hsinchun Chen 

July 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(231.61 KB) Additional Information: full citation , abstract, references , index terms 

Research has shown that most users' online information searches are suboptimal. Query 
optimization based on a relevance feedback or genetic algorithm using dynamic query 
contexts can help casual users search the Internet. These algorithms can draw on implicit 
user feedback based on the surrounding links and text in a search engine result set to 
expand user queries with a variable number of keywords in two manners. Positive 
expansion adds terms to a user's keywords with a Boolean "and," negative ... 

Keywords: Information retrieval, Internet, automatic query expansion, genetic algorithm, 
implicit user feedback, personalization, relevance feedback 




25 Test: Automatic generation of test sets for SBST of microprocessor IP cores Q 
^ E. Sanchez, M. Reorda Reorda, G. Squillero, M. Violante 

September 2005 Proceedings of the 18th annual symposium on Integrated circuits 
and system design SBCCI '05 

Publisher: ACM Press 

Full text available: ^ pdf(258.50 KB) Additional Information: full citation , abstract , references , index terms 

Higher integration densities, smaller feature lengths, and other technology advances, as 
well as architectural evolution, have made microprocessor cores exceptionally complex. 
Currently, Software-Based Self-Test (SBST) is becoming an attractive test solution since it 
guarantees high fault coverage figures, runs at-speed, and matches core test 
requirements while exploiting low-cost ATEs. However, automatically generating test 
programs is still an open problem. This paper presents a novel approach ... 

Keywords: FPGA, automatic test generation, hardware accelerator, microprocessor test, 
pipelined architectures, test programs 



26 Keynote address: Visualization challenges for a new cyberpharmaceutical computing I I 
paradigm 

Russell J. Turner, Kabir Chaturvedi, Nathan J. Edwards, Daniel Fasulo, Aaron L. Halpern, 
Daniel H. Huson, Oliver Kohlbacher, Jason R. Miller, Knut Reinert, Karin A. Remington, 
Russell Schwartz, Brian Walenz, Shibu Yooseph, Sorin Istrail 

October 2001 Proceedings of the IEEE 2001 symposium on parallel and large-data 

visualization and graphics 
Publisher: IEEE Press 

Full text available: ^ pdf(3.07 MB) Additional Information: full citation , abstract, references , index terms 

In recent years, an explosion in data has been profoundly changing the field of biology and 
creating the need for new areas of expertise, particularly in the handling of data. One vital 
area that has so far received insufficient attention is how to communicate the large 
quantities of diverse and complex information that is being generated. Celera has 
encountered a number of visualization problems in the course of developing tools for 
bioinformatics research, applying them to our data generation ... 

27 Power optimization using divide-and-conquer techniques for minimization of the I I 
number of operations 
Inki Hong, Miodrag Potkonjak, Ramesh Karri 

October 1999 ACM Transactions on Design Automation of Electronic Systems 




http://portal.acm.org/resu^ 2/6/2006 



Results (page 2): "ligand" and "genetic programming" and cache 



Page 3 of 6 



(TODAES), Volume 4 Issue 4 
Publisher: ACM Press 

Full text available: |g| pdf(278.45 KB) Additional Information: full citation , abstract , references , index terms 

We introduce an approach for power optimization using a set of compilation and 
architectural techniques. The key technical innovation is a novel divide-and-conquer 
compilation technique to minimize the number of operations for general computations. Our 
technique optimizes not only a significantly wider set of computations than the previously 
published techniques, but also outperforms (or performs at least as well as other 
techniques) on all examples. Along the architectural dimension, we in ... 

Keywords: code generation, transformations 



28 Ad Hoc, self-supervising peer-to-peer search networks Q 
Brian F. Cooper, Hector Garcia-Molina 

April 2005 ACM Transactions on Information Systems (TOIS), Volume 23 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(482.31 KB) Additional Information: full citation , abstract , references , index terms 

Peer-to-peer search networks are a popular and widely deployed means of searching 
massively distributed digital information repositories. Unfortunately, as such networks 
grow, peers may become overloaded processing messages from other peers. This article 
examines how to reduce the load on nodes in P2P networks by allowing them to self- 
organize into a relatively efficient network, and then self-tune to make the network even 
more efficient. Two local operations used by a peer are introduced: co ... 

Keywords: Peer-to-peer systems, information search and discovery 




29 I/O limitations in parallel molecular dynamics Q 
^ Terry W. Clark, L. Ridgway Scott, Stanislaw Wlodek, J. Andrew McCammon 

>^ December 1995 Proceedings of the 1995 ACM/IEEE conference on Supercomputing 
(CDROM) - Volume 00 Supercomputing '95 

Publisher: ACM Press, IEEE Computer Society 
Full text available: " PI pdfd 83.44 KB) 

ja html(2,39 KB) Additional Information: full citation , abstract , references , index terms 

H Publisher Site 

We discuss data production rates and their impact on the performance of scientific 
applications using parallel computers. On one hand, too high rates of data production can 
be overwhelming, exceeding logistical capacities for transfer, storage and analysis. On the 
other hand, the rate limiting step in a computationally-based study should be the human- 
guided analysis, not the calculation. We present performance data for a biomolecular 
simulation of the enzyme, acetylcholinesterase, which uses the ... 

30 Bioinformatics: BIOMIND-protein property prediction bv property proximity profiles Q 
Deendayal Dinakarpandian, Vijay Kumar 

March 2002 Proceedings of the 2002 ACM symposium on Applied computing 
Publisher: ACM Press 

Full text available: ^pdf(501.01 KB) Additional Information: full citation , abstract , references , index terms 

We present the infrastructure of a bioinformation system called BIOMIND, which exploits 
the close relationship between the structural and functional properties of proteins. The 
scheme presented here views proteins as composite entities with structural and functional 
properties, and searches are based on distances along each property axis. Explicitly, this 
allows one to frame complex queries using quantitative criteria that confer more discerning 
power than systems based on a text-m ... 

Keywords: data mining, database, proteins, query 
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31 Bioinformatics: transforming biomedical research and medical care: PetaFLOPS 
computing 

Toshikazu Ebisuzaki, Robert Germain, Makoto Taiji 
November 2004 Communications of the ACM, Volume 47 issue n 

Publisher: ACM Press 

Full text available:| |M^ Addjtional , nformatjon: fu || citation, abstract, index terms 

htrnK 17.09 Kb) 

PetaFLOPS computers— capable of performing a thousand trillion mathematical operations 
per second, 25 times faster than the largest supercomputers today— will open new doors 
to understanding the functions of biological molecules. 



32 A biological programming model for self-healing Q 
Selvin George, David Evans, Steven Marchette 

October 2003 Proceedings of the 2003 ACM workshop on Survivable and self- 
regenerative systems: in association with 10th ACM Conference on 
Computer and Communications Security 

Publisher: ACM Press 

Full text available: ^pdfd.OO MB) Additional Information: full citation , abstract , references 

Biological systems exhibit remarkable adaptation and robustness in the face of widely 
changing environments. By adopting properties of biological systems, we hope to design 
systems that operate adequately even in the presence of catastrophic failures and large 
scale attacks. We describe a programming paradigm based on the actions of biological 
cells and demonstrate the ability of systems built using our model to survive massive 
failures. Traditional methods of system design require explicit p ... 

33 A Framework for Three-Dimensional Simulation of Morphogenesis Q 
Trevor M. Cickovski, Chengbang Huang, Rajiv Chaturvedi, Tilmann Glimm, H. George E. 
Hentschel, Mark S. Alber, James A. Glazier, Stuart A. Newman, Jesus A. Izaguirre 

October 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 2 Issue 4 
Publisher: IEEE Computer Society Press 

Full text available: ^| pdf(1.62 MB) Additional Information: full citation , abstract 

We present CompuCell3D, a software framework for three-dimensional simulation of 
morphogenesis in different organisms. CompuCell3D employs biologically relevant models 
for cell clustering, growth, and interaction with chemical fields. CompuCell3D uses design 
patterns for speed, efficient memory management, extensibility, and flexibility to allow an 
almost unlimited variety of simulations. We have verified CompuCell3D by building a 
model of growth and skeletal pattern formation in the avian (chic ... 

Keywords: Cellular Potts Model (CPM), biological development, reaction-diffusion, cellular 
automata, morphogenesis, Extensible Markup Language (XML). 




34 Artificial life, evolutionary robotics, and adaptive behavior: The predictive basis of 
^ situated and embodied artificial intelligence 
V Keith L. Downing 

June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 
computation GECCO '05 

Publisher: ACM Press 

Full text available: ^| pdf( 172.82 KB) Additional Information: full citation , abstract , references , index terms 

While classic AI systems still struggle to properly incorporate common-sense knowledge, 
Situated and Embodied Artificial Intelligence (SEAI) aims to build animats that acquire a 
common-sense understanding of the world via interactions between simulated brains, 



http://portal.acm.org/results.cfm?query=%221igand%22%20and%20%22geneti 2/6/2006 



Results (page 2): "ligand" and "genetic programming" and cache 



Page 5 of 6 



bodies and environments. Neuroscientists believe that much of this common sense 
involves predictive models for physical activities, but the transfer of sensorimotor skill 
knowledge to cognition is non-trivial, indicating that SEAI may meet ... 

Keywords: artificial intelligence, embodiment, neural networks, situatedness 



35 Learning evaluation functions to improve optimization by local search Q 
Justin Boyan, Andrew W. Moore 

September 2001 The Journal of Machine Learning Research, Volume l 
Publisher: MIT Press 

Full text available: ^ pdf(643.21 KB) Additional Information: full citation , abstract , citings 

This paper describes algorithms that learn to improve search performance on large-scale 
optimization tasks. The main algorithm, STAGE, works by learning an evaluation function 
that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from 
features of states visited during search. The learned evaluation function is then used to 
bias future search trajectories toward better optima on the same problem. Another 
algorithm, X-STAGE, transfers previously learned evaluation ... 



36 Quo Vadis evolvable hardware? 
JjL Moshe Sipper, Daniel Mange, Eduardo Sanchez 
^ April 1999 Communications of the ACM, Volume 42 issue 4 

Publisher: ACM Press 

Full text available: gpdf(409.06KB) Additjona| , nformatjon: citation, references , citings, index terms 
m html(34.33 KB) 



37 Real world applications: New evolutionary techniques for test-program generation for I I 

^ complex microprocessor cores 

E. Sanchez, M. Schillaci, M. Sonza Reorda, G. Squillero, L Sterpone, M. Violante 
June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 

computation GECCO '05 
Publisher: ACM Press 

Full text available: Qpdfd 70.77 KB) Additional Information: full citation , abstract , references , index terms 

Checking if microprocessor cores are fully functional at the end of the productive process 
has become a major issue. Traditional functional approaches are not sufficient when 
considering modern designs. This paper describes new improvements for an existing 
evolutionary algorithm, called pGP, able to generate Turing-complete programs; these are 
exploited, along with hardware acceleration techniques, to add content to a qualifying test 
campaign by automatically generating assembly programs. T ... 

Keywords: automatic test program generation, evolutionary algorithms 



38 ProtoMol, an object-oriented framework for prototyping novel algorithms for molecular I I 
A dynamics 

^ Thierry Matthey, Trevor Cickovski, Scott Hampton, Alice Ko, Qun Ma, Matthew Nyerges, Troy 
Raeder, Thomas Slabach, Jesus A. Izaguirre 

September 2004 ACM Transactions on Mathematical Software (TOMS), volume 30 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(91 1 .92 KB) Additional Information: full citation , abstract , references , index terms 

ProtoMol is a high-performance framework in C++ for rapid prototyping of novel 
algorithms for molecular dynamics and related applications. Its flexibility is achieved 
primarily through the use of inheritance and design patterns (object-oriented 
programming). Performance is obtained by using templates that enable generation of 
efficient code for sections critical to performance (generic programming). The framework 
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encapsulates important optimizations that can be used by developers, such as parall ... 



Keywords: Fast electrostatic methods, incremental parallelism, molecular dynamics, 
multigrid, multiple time-stepping integration, object-oriented framework. 



39 Modeling methodology b: Parallel distributed simulation and modeling methods: an I I 
algorithm for fully-reversible optimistic parallel simulation 
Michael D. Peters, Christopher D. Carothers 

December 2003 Proceedings of the 35th conference on Winter simulation: driving 

innovation 
Publisher: Winter Simulation Conference 

Full text available: ^ pdfd 57.29 KB) Additional Information: full citation, abstract , references 

Typically, large-scale optimistic parallel simulations will spend 90% or more of the total 
execution time forward processing events and very little time executing rollbacks. In fact, 
it was recently shown that a large-scale TCP model consisting of over 1 million nodes will 
execute without generating <i>any</i> rollbacks (i.e., perfect optimistic execution is 
achieved). The major cost involved in forward execution is the preparation for a rollback in 
the form of state-saving. Using a t ... 

4 ° Synchronization and cache coherence in computer design Q 
Mohamad R. Neilforoshan 

December 2005 Journal of Computing Sciences in Colleges, Volume 21 issue 2 
Publisher: Consortium for Computing Sciences in Colleges 

Full text available: ^pdfd 82.24 KB) Additional Information: full citation, abstract , references , index terms 

Cache coherence and synchronization are two important issues that a computer designer 
must consider. These topics are typically considered individually and taught to students in 
computer design courses. The first goal of this paper is to show the role that cache 
coherence can play when implementing synchronization primitives. The second goal is to 
illustrate the importance of synchronization techniques as a part of cache coherence's 
overall function in computer design. Finally, the last goal is ... 
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41 Reducing data cache leakage energy using a compiler-based approach 
Wei Zhang, Mahmut Kandemir, Mustafa Karakoy, Guangyu Chen 

August 2005 ACM Transactions on Embedded Computing Systems (TECS), Volume 4 issue 
3 

Publisher: ACM Press 

Full text available: pdf(750.57 KB) Additional Information: full citation , abstract , references , index terms 

Silicon technology advances have made it possible to pack millions of transistors- 
switching at high clock speeds— on a single chip. While these advances bring 
unprecedented performance to electronic products, they also pose difficult power/energy 
consumption problems. For example, large number of transistors in dense on-chip cache 
memories consume significant static (leakage) power even if the cache is not used by the 
current computation. While previous compiler research studied code and data ... 

Keywords: Compiler analysis, array-intensive applications, data caches, energy 
optimization, pointer-intensive applications 



□ 



42 O ptimal methods for coordinated enroute web caching for tree networks 
^ Keqiu Li, Hong Shen, Francis Y. L. Chin, Si Qing Zheng 

^ August 2005 ACM Transactions on Internet Technology (TOIT), Volume 5 issue 3 
Publisher: ACM Press 

Full text available: «pd«343.64 KB) Addjtlonal Information: full citation , abstract, references , citings, index 

terms 

Web caching is an important technology for improving the scalability of Web services. One 
of the key problems in coordinated enroute Web caching is to compute the locations for 
storing copies of an object among the enroute caches so that some specified objectives 
are achieved. In this article, we address this problem for tree networks, and formulate it 
as a maximization problem. We consider this problem for both unconstrained and 
constrained cases. The constrained case includes constraints on th ... 

Keywords: Web caching, autonomous system (AS), dynamic programming, object 
placement (replacement), performance evaluation, tree network 



43 WCRT analysis for a uniprocessor with a unified prioritized cache Q 
Yudong Tan, Vincent J. Mooney 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN/SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES'05, Volume 40 Issue 7 
Publisher: ACM Press 
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Full text available: ^ |pdfM81.49 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we investigate the problem of inter-task cache interference in preemptive 
multi-tasking real-time systems. A prioritized cache is used to reduce cache conflicts 
among tasks by partitioning the cache. Cache partitions are assigned to tasks according to 
their priorities. We extend a known tool, SYMTA, in order to estimate the Worst Case 
Execution Time of tasks executing on a uniprocessor with a unified prioritized LI cache. 
Furthermore, we apply a formal timing analysis approach to ... 

Keywords: cache design, real-time system, timing analysis 



44 A sample-based cache mapping scheme Q 
Aw Rong Xu, Zhiyuan Li 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN/SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES'05, Volume 40 Issue 7 
Publisher: ACM Press 

Full text available: |^pdf(1 64.54 KB) Additional Information: full citation , abstract , references , index terms 

Applications running on the StrongARM SA-1110 or XScale processor cores can specify 
cache mapping for each virtual page to achieve better cache utilization. In this work, we 
describe a method to efficiently perform cache mapping. Under this scheme, we select a 
number of loops for sampling. These loops are selected automatically based on clock 
profiling information. We formulate the optimal cache mapping problem as an Integer 
Linear Programming (ILP) problem. Experiments performed on 14 test prog ... 

Keywords: cache bypass, cache mapping, handheld devices, mini cache, profiling, trace 
sampling 



45 Scalable precision cache analysis for preemptive scheduling Q 
^ Jan Staschulat, Rolf Ernst 

V- June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN/SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES'05, Volume 40 Issue 7 
Publisher: ACM Press 

Full text available: ^ pdf(226.37 KB) Additional Information: full citation , abstract , references, index terms 

Accurate timing analysis is key to efficient embedded system synthesis and integration. 
Caches are needed to increase the processor performance but they are hard to use 
because of their complex behavior especially in preemptive scheduling. Current 
approaches use simplified assumptions or propose exponentially complex analysis 
algorithms to bound the cache related preemption delay at a context switch. Existing 
approaches consider only direct mapped caches or propose non conservative 
approximation ... 

Keywords: cache, embedded systems, scheduling, worst case execution time analysis 



46 Cache aware optimization of stream programs 

Janis Sermulins, William Thies, Rodric Rabbah, Saman Amarasinghe 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN/SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES'05, Volume 40 Issue 7 
Publisher: ACM Press 

Full text available: ^ pdf(218.59 KB) Additional Information: full citation , abstract , references , index terms 

Effective use of the memory hierarchy is critical for achieving high performance on 
embedded systems. We focus on the class of streaming applications, which is increasingly 
prevalent in the embedded domain. We exploit the widespread parallelism and regular 
communication patterns in stream programs to formulate a set of cache aware 
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optimizations that automatically improve instruction and data locality. Our work is in the 
context of the Synchronous Dataflow model, in which a program is described a ... 

Keywords: Streamlt, cache, cache optimizations, embedded, fusion, stream programing, 
synchronous dataflow 



47 On the design of the local variable cache in a hardware translation-based java virtual I I 

machine 
Hitoshi Oi 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN/ SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES'05, Volume 40 Issue 7 
Publisher: ACM Press 

Full text available: ^pdfd 18.36 KB) Additional Information: full citation , abstract , references , index terms 

Hardware bytecode translation is a technique to improve the performance of the Java 
Virtual Machine (JVM), especially on the portable devices for which dynamic compilation is 
infeasible. However, since the translation is done on a single bytecode basis, it is likely to 
generate frequent memory accesses for local variables which can be a performance 
bottleneck. In this paper, we propose to add a small register file to the datapath of the 
hardware-translation based JVM and use it as a local variabl ... 

Keywords: hardware-translation, java virtual machine, memory hierarchy 



48 The V-Way Cache: Demand Based Associativity via Global Replacement Q 

#Moinuddin K. Qureshi, David Thompson, Yale N. Patt 
May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: ^ pdf(231 .93 KB) Additional Information: full citation, abstract , index terms 

As processor speeds increase and memory latency becomes more critical, intelligent 
design and management of secondary caches becomes increasingly important. The 
efficiency of current set-associative caches is reduced because programs exhibit a non- 
uniform distribution of memory accesses across different cache sets. We propose a 
technique to vary the associativity of a cache on a per-set basis in response to the 
demands of the program. By increasing the number of tag-store entries relative to the ... 



49 Store Buffer Design in First-Level Multibanked Data Caches Q 

#E. F. Torres, P. Ibanez, V. Vinals, J. M. Llaberia 
May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: Q pdf(293.03 KB) Additional Information: full citation , abstract , index terms 

This paper focuses on how to design a Store Buffer (STB) well suited to first-level 
multibanked data caches. Our goal is to forward data from in-flight stores to dependent 
loads with the latency of a cache bank. For that we propose a particular two-level STB 
design in which forwarding is done speculatively from a distributed first-level STB made of 
extremely small banks, while a centralized, second-level STB enforces correct store-load 
ordering a few cycles later. To that end we have identified ... 



50 Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip I I 

Multiprocessors 

Evan Speight, Hazim Shafi, Lixin Zhang, Ram Rajamony 

May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 
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Volume 33 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: ^p pdfd 28.39 KB) Additional Information: full citation , abstract , citings , index terms 

With the ability to place large numbers of transistors on a single silicon chip, 
manufacturers have begun developing chip multiprocessors (CMPs) containing multiple 
processor cores, varying amounts of level 1 and level 2 caching, and on-chip directory 
structures for level 3 caches and memory. The level 3 cache may be used as a victim 
cache for both modified and clean lines evicted from on-chip level 2 caches. Efficient area 
and performance management of this cache hierarchy is paramount given th ... 

51 Direct Cache Access for High Bandwidth Network I/O Q 
Ram Huggahalli, Ravi Iyer, Scott Tetrick 

May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: ^| pdf(1 94.52 KB) Additional Information: full citation , abstract , index terms 

Recent I/O technologies such as PCI-Express and 10Gb Ethernet enable unprecedented 
levels of I/O bandwidths in mainstream platforms. However, in traditional architectures, 
memory latency alone can limit processors from matching 10 Gb inbound network I/O 
traffic. We propose a platform-wide method called Direct Cache Access (DCA) to deliver 
inbound I/O data directly into processor caches. We demonstrate that DCA provides a 
significant reduction in memory latency and memory bandwidth for receive in ... 

52 Special issue: dasCMP'05: Exploring the cache design space for large scale CMPs Q 
djj^ Lisa Hsu, Ravi Iyer, Srihari Makineni, Steve Reinhardt, Donald Newell 

^ November 2005 ACM SIGARCH Computer Architecture News, volume 33 issue 4 

Publisher: ACM Press 

Full text available: pdf(347.09 KB) Additional Information: full citation , abstract , references 

With the advent of dual-core chips in the marketplace, small-scale CMP (chip 
multiprocessor) architectures are becoming commonplace. We expect a continuing trend of 
increasing the number of cores on a die to maximize the performance/power efficiency of 
a single chip. We believe an era of large-scale CMPs (LCMPs) with several tens to hundreds 
of cores is on the way, but as of now architects have little understanding of how best to 
build a cache hierarchy given such a large number of cores/threads ... 

53 Energy-conserving data cache placement in sensor networks Q 
K. Shashi Prabh, Tarek F. Abdelzaher 

^Sr November 2005 ACM Transactions on Sensor Networks (TOSN), Volume l issue 2 

Publisher: ACM Press 

Full text available: |E| pdf(662.77 KB) Additional Information: full citation , abstract, references , index terms 

Wireless sensor networks hold a very promising future. The nodes of wireless sensor 
networks (WSN) have a small energy supply and limited bandwidth available. Since radio 
communication is expensive in terms of energy consumption, the nodes typically spend 
most of their energy reserve on wireless communication (rather than on CPU processing) 
for data dissemination and retrieval. Therefore, the role of energy conserving data 
communication protocols and services in WSN can not be overemphasized. Ca ... 

Keywords: Energy and bandwidth management, Steiner tree, asynchronous multicast, 
data caching, foundations of sensor networks 



54 Regular contributions: Exploiting the replication cache to improve performance for I I 
multiple-issue microprocessors 
Bramha Allu, Wei Zhang 

June 2005 ACM SIGARCH Computer Architecture News, volume 33 issue 3 
Publisher: ACM Press 
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Full text available: pdf(220.65 KB) Additional Information: full citation, abstract , references 

Performance and reliability are both of great importance for microprocessor design. 
Recently, the replication cache has been proposed to enhance data cache reliability against 
soft errors. The replication cache is a small fully associative cache to store the replica for 
every write to the LI data cache. In addition to enhance data reliability, this paper 
proposes several cost-effective techniques to improve performance of multiple-issue 
microprocessors by exploiting the replication cache. The id ... 

55 MEDEA 2004 workshop: Improving data cache performance with integrated use of I I 
split caches, victim cache and stream buffers 
Afrin Naz, Mehran Rezaei, Krishna Kavi, Philip Sweany 
June 2005 ACM SIGARCH Computer Architecture News, volume 33 issue 3 

Publisher: ACM Press 

Full text available: *g| pdf(257.60 KB) Additional Information: full citation , abstract , references 

In our prior work we explored a cache organization providing architectural support for 
distinguishing between memory references that exhibit spatial and temporal locality and 
mapping them to separate caches.That work showed that using separate (data) caches for 
indexed or stream data and scalar data items could lead to substantial improvements in 
terms of cache misses. In addition, such a separation allowed for the design of caches that 
could be tailored to meet the properties exhibited by diffe ... 

Keywords: array cache, memory access time, scalar cache, stream buffer, victim cache 




56 MEDEA 2004 workshop: Locality analysis to control dynamically way-adaptable I I 
^ caches 

^ Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa 

June 2005 ACM SIGARCH Computer Architecture News, Volume 33 issue 3 

Publisher: ACM Press 

Full text available: ^pdf(513.49 KB) Additional Information: full citation , abstract , references 

This paper presents a control mechanism for dynamically way-adaptable caches. The 
mechanism uses the local and global information about the locality of reference during 
execution. As the local information, the cache access pattern is evaluated based on the 
statistics of the LRU (Least-Recently Used) states of cache entries referenced. If the 
memory accesses are concentrated on and near the most recently used entries, the 
mechanism knows that the locality of reference is very high and there is ro ... 

57 Building adaptable cache services Q 




Laurent d'Orazio, Fabrice Jouanot, Cyril Labbe, Claudia Roncancio 

November 2005 Proceedings of the 3rd international workshop on Middleware for grid 



computing MGC '05 

Publisher: ACM Press 

Full text available: |§ pdf(376.15 KB) Additional Information: full citation , abstract , references , index terms 

Caching is crucial to improve performances in many computing systems. It is context 
dependent, thus many types of cache exist. As a consequence, when a cache is required, 
it is usually built from scratch. Such a solution is time (and money) consuming, in 
particular in data grid context where several caches may be required. This paper proposes 
ACS (Adaptable Cache Service), a framework which allows building adaptable cache 
services. It presents a generic cache definition and provides a descriptio ... 

Keywords: adaptability, cache, component, framework, grid, middleware 



58 Thermal Management of On-Chip Caches Through Power Density Minimization Q 
Ja Chun Ku, Serkan Ozdemir, Gokhan Memik, Yehea Ismail 

November 2005 Proceedings of the 38th annual IEEE/ACM International Symposium 
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on Microarchitecture MICRO 38 
Publisher: IEEE Computer Society 
Full text available: fBpdf(371.02 KB) 

Additional Information: full citation, abstract 

W Publisher Site 

Various architectural power reduction techniques have been proposed for on-chip caches 
in the last decade. However, these techniques mostly ignore the effects of temperature on 
the power consumption. In this paper, first we show that these power reduction 
techniques can be suboptimal when thermal effects are considered. Particularly, we 
propose a thermal-aware cache powerdown technique that minimizes the power density of 
the active parts by turning off alternating rows of memory cells instead of ... 



59 A highly configurable cache for low energy embedded systems Q 
^ Chuanjun Zhang, Frank Vahid, Walid Najjar 

May 2005 ACM Transactions on Embedded Computing Systems (TECS), Volume 4 issue 2 

Publisher: ACM Press 

Full text available: ^ pdf(714.89 KB) Additional Information: full citation , abstract , references , index terms 

Energy consumption is a major concern in many embedded computing systems. Several 
studies have shown that cache memories account for about 50&percnt; of the total energy 
consumed in these systems. The performance of a given cache architecture is determined, 
to a large degree, by the behavior of the application executing on the architecture. 
Desktop systems have to accommodate a very wide range of applications and therefore 
the cache architecture is usually set by the manufacturer as a best compr ... 

Keywords: Cache, architecture tuning, configurable, embedded systems, low energy, low 
power, memory hierarchy, microprocessor 



60 Cache Refill/Access Decoupling for Vector Machines Q 
Christopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic 
December 2004 Proceedings of the 37th annual IEEE/ACM International Symposium 

on Microarchitecture MICRO 37 
Publisher: IEEE Computer Society 

Full text available: *gjj pdf(319.32 KB) Additional Information: full citation , abstract 

Vector processors often use a cache to exploit temporal locality and reduce memory 
bandwidth demands, but then require expensive logic to track large numbers of 
outstanding cache misses to sustain peak bandwidth from memory. We present 
refill/access decoupling, which augments the vector processor with a Vector Refill Unit 
(VRU) to quickly pre-execute vector memory commands and issue any needed cache line 
refills ahead of regular execution. The VRU reduces costs by eliminating much of the 
outstan ... 
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